Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /v1/project/batchDelete API method that deletes with SQL #4383

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mikael-carneholm-2-wcar

Description

This patch adds a /v1/project/batchDelete method that makes project life-cycle management more effective since multiple projects can be deleted per request.

Addressed Issue

This fixes #3361

Additional Details

This is a modified version of #3407 that instead uses pure SQL to do the deletion. And by doing so, it's much faster:

  • The previous implementation was able to delete 61 projects/second (best case, with a batch size of 1000)
  • This implementation has been verified to delete 430 projects/second (with a batch size of 1000)

A caveat is that it uses SQL Common Table Expressions (CTEs) which are part of the ANSI standard since SQL:1999, but still not supported by all DB vendors. The H2 database used for the unit tests has experimental support for CTEs, but does not support CTE DELETE statements. For this reason the code checks the name of the JDBC driver to conditionally run certain statements if the driver is org.postgresql.Driver.

Benchmarks using a local sample DB populated with ~11000 projects imported from a production system:

batch_size      time
100             0m34.193s
200             0m27.759s
500             0m31.075s
1000            0m25.451s

NB: The effect of the batch size is limited by the checkpoint configuration (see https://www.enterprisedb.com/blog/basics-tuning-checkpoints), but in this test the container DB was running with a default configuration.

Checklist

  • I have read and understand the contributing guidelines
    - [ ] This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
    - [ ] This PR introduces changes to the database model, and I have added corresponding update logic
    - [ ] This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

)
@ApiResponses(value = {
@ApiResponse(responseCode = "204", description = "Projects removed successfully"),
@ApiResponse(responseCode = "207", description = "Access is forbidden to the projects listed in the response"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sold on this status code. Some surface-level research leads me to believe using this outside of WebDAV is not intended: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/207

Wouldn't it be better to atomically fail the request if deletion of one or more projects failed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, the 207 indeed seems to be reserved for WebDAV. The reason for implementing it this way was based on the outcome of this discussion, where the suggestion by sebD was to return a 204 with a body containing the list of inaccessible projects. The 207 was my suggestion, and since I got a thumbs up I thought I would do it the same way in this PR. I see a value in letting the client know which projects that are inaccessible, so I'm leaning towards going with sebD's suggestion instead of just returning a 401. Would that be an acceptable solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sebD's original comment was geared towards the situation where we proceed with deleting accessible projects when there have been inaccessible projects in the provided list. In that case you do need to communicate the partial success somehow.

My point is that no projects should be deleted at all, if at least one of them is inaccessible. You can then still respond with 401, and you could use ProblemDetails to communicate the inaccessible projects in a machine-readable way.

We do this for tag endpoints already:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll look into using ProblemDetails since that will make it consistent with how failed tag deletion is handled. Thanks for the tip!

@ApiResponse(responseCode = "401", description = "Unauthorized")
})
@PermissionRequired(Permissions.Constants.PORTFOLIO_MANAGEMENT)
public Response deleteProjects(List<UUID> uuids) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using @Size(min = 1, max = ...) here. The limits should then also appear in the OpenAPI spec automatically.

I think there should be some upper limit to the number of UUIDs being submitted in one go. We can always extend it later if people need it, but we can't reduce it later if larger batches end up causing problems.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I have tested it with up to 1000 UUIDs, so maybe that could be a reasonable max size to start with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds reasonable.

try (QueryManager qm = new QueryManager()) {
for (Iterator<UUID> it = uuids.iterator(); it.hasNext();) {
UUID uuid = it.next();
final Project project = qm.getObjectByUuid(Project.class, uuid, Project.FetchGroup.ALL.name());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using FetchGroup.ALL here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that doesn't seem to be needed since Project.accessTeams is part of the default fetch group. ProjectQueryManager.hasAccess only needs that property from the project, so I'll change to the getObjectByUuid(Class, UUID) signature instead.

""");
executeAndCloseWithArray(sqlQuery, (Object) uuidsArray);

// The below has only been tested with Postgres, but should work on any RDBMS supporting SQL:1999
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, how many RDBMSes have you tested with this? Having tested at least H2, MSSQL, and PostgreSQL would be good. We have a Docker Compose file for MSSQL here: https://github.com/DependencyTrack/dependency-track/blob/master/dev/docker-compose.mssql.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only tested it with H2 and PostgreSQL so far (H2 in unit tests, PostgreSQL in system tests) but I can try migrating my test data to MSSQL and test that as well. I'll get back to you with the results when done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do a full load test or anything. We just need assurance that the queries work as expected. Sadly the small differences in SQL syntax between the different RDBMSes turn out to be tripwires more often than not. :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, true, and true :) But I'll need some data in the schema to pass the condition that the UUID list is not empty, so I'll try to migrate the relatively small exported sample that I created with pg_sample.

@nscuro nscuro added the enhancement New feature or request label Nov 19, 2024
@nscuro nscuro added this to the 4.13 milestone Nov 19, 2024
Copy link

codacy-production bot commented Nov 19, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.01% (target: -1.00%) 86.05% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (593b4f9) 22588 17880 79.16%
Head commit (3450de7) 22674 (+86) 17951 (+71) 79.17% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4383) 86 74 86.05%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make project life-cycle management more effective
2 participants